Search | WHO COVID-19 Research Database

Benchmark datasets for SARS-CoV-2 surveillance bioinformatics.

Xiaoli, Lingzi; Hagey, Jill V; Park, Daniel J; Gulvik, Christopher A; Young, Erin L; Alikhan, Nabil-Fareed; Lawsin, Adrian; Hassell, Norman; Knipe, Kristen; Oakeson, Kelly F; Retchless, Adam C; Shakya, Migun; Lo, Chien-Chi; Chain, Patrick; Page, Andrew J; Metcalf, Benjamin J; Su, Michelle; Rowell, Jessica; Vidyaprakash, Eshaw; Paden, Clinton R; Huang, Andrew D; Roellig, Dawn; Patel, Ketan; Winglee, Kathryn; Weigand, Michael R; Katz, Lee S.

PeerJ ; 10: e13821, 2022.

Article in English | MEDLINE | ID: covidwho-2010486

ABSTRACT

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. Methods: We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. Results: The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. Discussion: The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines.

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package.

Griffiths, Emma J; Timme, Ruth E; Mendes, Catarina Inês; Page, Andrew J; Alikhan, Nabil-Fareed; Fornika, Dan; Maguire, Finlay; Campos, Josefina; Park, Daniel; Olawoye, Idowu B; Oluniyi, Paul E; Anderson, Dominique; Christoffels, Alan; da Silva, Anders Gonçalves; Cameron, Rhiannon; Dooley, Damion; Katz, Lee S; Black, Allison; Karsch-Mizrachi, Ilene; Barrett, Tanya; Johnston, Anjanette; Connor, Thomas R; Nicholls, Samuel M; Witney, Adam A; Tyson, Gregory H; Tausch, Simon H; Raphenya, Amogelang R; Alcock, Brian; Aanensen, David M; Hodcroft, Emma; Hsiao, William W L; Vasconcelos, Ana Tereza R; MacCannell, Duncan R.

Gigascience ; 112022 02 16.

Article in English | MEDLINE | ID: covidwho-1692222

ABSTRACT

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.

Subject(s)

COVID-19 , SARS-CoV-2 , Genomics , Humans , Metadata , Public Health , Reproducibility of Results

Clinical and Laboratory Findings in Patients With Potential Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Reinfection, May-July 2020.

Lee, James T; Hesse, Elisabeth M; Paulin, Heather N; Datta, Deblina; Katz, Lee S; Talwar, Amish; Chang, Gregory; Galang, Romeo R; Harcourt, Jennifer L; Tamin, Azaibi; Thornburg, Natalie J; Wong, Karen K; Stevens, Valerie; Kim, Kaylee; Tong, Suxiang; Zhou, Bin; Queen, Krista; Drobeniuc, Jan; Folster, Jennifer M; Sexton, D Joseph; Ramachandran, Sumathi; Browne, Hannah; Iskander, John; Mitruka, Kiren.

Clin Infect Dis ; 73(12): 2217-2225, 2021 12 16.

Article in English | MEDLINE | ID: covidwho-1595231

ABSTRACT

BACKGROUND: We investigated patients with potential severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) reinfection in the United States during May-July 2020. METHODS: We conducted case finding for patients with potential SARS-CoV-2 reinfection through the Emerging Infections Network. Cases reported were screened for laboratory and clinical findings of potential reinfection followed by requests for medical records and laboratory specimens. Available medical records were abstracted to characterize patient demographics, comorbidities, clinical course, and laboratory test results. Submitted specimens underwent further testing, including reverse transcription polymerase chain reaction (RT-PCR), viral culture, whole genome sequencing, subgenomic RNA PCR, and testing for anti-SARS-CoV-2 total antibody. RESULTS: Among 73 potential reinfection patients with available records, 30 patients had recurrent coronavirus disease 2019 (COVID-19) symptoms explained by alternative diagnoses with concurrent SARS-CoV-2 positive RT-PCR, 24 patients remained asymptomatic after recovery but had recurrent or persistent RT-PCR, and 19 patients had recurrent COVID-19 symptoms with concurrent SARS-CoV-2 positive RT-PCR but no alternative diagnoses. These 19 patients had symptom recurrence a median of 57 days after initial symptom onset (interquartile range: 47-76). Six of these patients had paired specimens available for further testing, but none had laboratory findings confirming reinfections. Testing of an additional 3 patients with recurrent symptoms and alternative diagnoses also did not confirm reinfection. CONCLUSIONS: We did not confirm SARS-CoV-2 reinfection within 90 days of the initial infection based on the clinical and laboratory characteristics of cases in this investigation. Our findings support current Centers for Disease Control and Prevention (CDC) guidance around quarantine and testing for patients who have recovered from COVID-19.

Subject(s)

COVID-19 , SARS-CoV-2 , Antibodies, Viral , Humans , Laboratories , Reinfection

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL